Shorter planning depth and higher response noise during sequential decision-making in old age

Forward planning is crucial to maximize outcome in complex sequential decision-making scenarios. In this cross-sectional study, we were particularly interested in age-related differences of forward planning. We presumed that especially older individuals would show a shorter planning depth to keep the costs of model-based decision-making within limits. To test this hypothesis, we developed a sequential decision-making task to assess forward planning in younger (age < 40 years; n = 25) and older (age > 60 years; n = 27) adults. By using reinforcement learning modelling, we inferred planning depths from participants' choices. Our results showed significantly shorter planning depths and higher response noise for older adults. Age differences in planning depth were only partially explained by well-known cognitive covariates such as working memory and processing speed. Consistent with previous findings, this indicates age-related shifts away from model-based behaviour in older adults. In addition to a shorter planning depth, our findings suggest that older adults also apply a variety of heuristical low-cost strategies.


Scientific Reports
| (2023) 13:7692 | https://doi.org/10.1038/s41598-023-33274-0 www.nature.com/scientificreports/ suggests that older adults might not have planned far ahead enough to find the shorter sequence. However, these tasks do not directly assess planning depth. Moreover, they address a specific problem-solving scenario with a deterministic environment and a single given target state, which limits external validity. Secondly, the relative influence of model-based and model-free control on human decision-making has been investigated in rewardbased sequential decision-making tasks like the two-stage Markov task 9 . While these studies found a decreased contribution of model-based control to choices of older adults [10][11][12] , they also do not differentiate computational parameter values of model-based control like planning depth. Nevertheless, both strands of evidence suggest that older compared to younger adults demonstrate reduced forward planning capabilities. Additionally, forward planning involves several fluid cognitive abilities like working memory, processing speed and executive control 13 . These have been shown to influence model-based learning 11,[14][15][16] and are classical examples for significant cognitive decline in old age starting from early adulthood 5 . In related neuropsychological tasks, older adults usually demonstrated both, an overall slowing as well as less accurate responses 17 . Previous findings of lower performance in model-based control in old age might be confounded with these general cognitive abilities. Similarly, they might also explain differences in planning depth. We therefore included indicators of working memory and processing speed in assessments of forward planning as potential covariates.
In order to assess potential age-related differences in planning depth between younger and older adults, we designed a sequential decision-making task, the Space Adventure Task (SAT), which required participants to use model-based planning in order to make beneficial choices. The task further allowed to differentiate between different planning depths based on participants' choices. Choices in the SAT were modelled with a reinforcement learning (RL) agent model which in turn allowed us to infer planning depths with hierarchical Bayesian inference of free model parameters (see Methods section for details). To motivate participants, deeper planning yielded higher rewards. Moreover, we varied the degree of randomness in state transitions, i.e. the predictability of outcomes, to explore how participants adapt planning when facing different levels of uncertainty. To investigate to what extent differences in planning depth could be explained by classical constructs of cognitive performance, participants underwent a neuropsychological assessment for processing speed and spatial working memory.
Based on previous research, we hypothesized that in the SAT, older compared to younger adults should demonstrate reduced forward planning capabilities indicated by lower scores and a lower inferred planning depth. Regarding outcome certainty, we presumed that a less predictable environment should lead to a reduction in planning depth. Finally, we expected performance measures for the two assessed general cognitive abilities to be positively associated with planning depth while not fully explaining group differences in planning depth.

Methods
Participants. Twenty-seven older adults (13 women, age above 60 years: M = 68.8) and twenty-five younger adults (7 women, age below 40 years: M = 26.4) took part in the experiment. The study was conducted as an associated experiment for the larger research project "Aging and neuromodulation of forward planning under uncertainty" in the collaborative research centre funded by the German Research Foundation (DFG SFB 940). Prior to the appointment, participants were telephone screened for potential exclusion due to psychiatric or neurological illness. The assessment consisted of a sociodemographic questionnaire, an eyesight test, a neuropsychological test battery and the SAT. The overall procedure took around 2-2.5 h and participants were compensated with 20 Euros plus a maximum of additional 10 Euros depending on their performance in the SAT. Ethic approval in accordance with the Helsinki declaration was granted by the ethic committee of the TU Dresden, Germany (EK 514122018). All participants signed informed consents before the start of study participation. Demographic characteristics and basic cognitive covariates are depicted in Table 1. Age ranged from 17 to 38 years for younger adults and from 61 to 75 years for older adults. Groups did not differ significantly in gender distribution according to Pearson's chi-squared test, χ 2 (1) = 2.23, p = .136.
Cognitive tests. Cognitive covariates were assessed with three computerized tasks. The Spot-a-Word Test (SAW) examined verbal knowledge as an indicator for crystalline intelligence and the Identical Pictures Task (IDP) measured perceptual processing speed as an indicator for fluid intelligence 18 . Thirdly, the first subtask (location memory condition across both load levels) of the Spatial Working Memory Task (SWM) was used to test for spatial abilities 19 . For brief descriptions of the tasks, see Figs. S1-S3. In each test, reaction time (RT) and number of correct responses were measured. Trials with reaction time below 150 ms were excluded. Performance was calculated as achieved percentage of maximum possible number of correct responses. In line with previous evidence 5 , independent t-tests of group means revealed that older adults showed a significantly lower performance in the tasks associated with fluid intelligence, the SWM (t(32.06) = 3.70, p < .001) and the IDP (t(50) = 7.62, p < .001). Similarly, as expected, older adults showed superior results for crystalline intelligence in the SAW, as indicated by a Mann-Whitney-U test (used because of non-normal distributions; Z = − 2.47, p = .013).
Space adventure task. The SAT is a sequential decision-making task based on a task first described by Huys et al. 20 . Participants navigated a spaceship through various planetary systems. Each system was a configuration of six planets, each being one out of five possible planet types. Participants had to spend fuel to travel between planets but could gain fuel by arriving at specific planet types (Fig. 1A). The task goal was to accumulate as much fuel as possible throughout the experiment consisting of the same sequence of N = 100 mini-blocks for every participant.
In each mini-block, participants were presented with a specific planet configuration and starting position of their spaceship. A bar at the top indicated the current amount of fuel accumulated so far, as well as the number of remaining actions that can be performed during the mini-block (Fig. 1B) www.nature.com/scientificreports/ a way that participants had to use forward planning to find the route leading to the maximum possible fuel gain within a specific number of actions (two or three steps). At each step, participants could choose to either (i) move to the next planet in a clockwise fashion, or to (ii) jump to a specific non-neighbouring planet. Travelling to the next planet into clockwise direction cost two units and jumping cost five units of fuel. The target planets for jumping were determined by a given travel pattern (Fig. 1C). Moreover, while moving clockwise was a deterministic action, jumping was uncertain, i.e. jumping was successful only with a specific probability. In case of jump failure, the jump led to one of the two neighbouring planets of the target planet, each with equal probability.
In addition to the available numbers of actions, the jump (transition) uncertainty was varied among miniblocks. In the low noise condition, jumping succeeded with a 90% probability, while in high noise mini-blocks (indicated by asteroids in the background) it succeeded with a 50% probability. The experiment had a 2-by-2 factorial design with the factors 'steps' (two or three total actions) and 'noise' (low or high uncertainty). The resulting four conditions were assigned to four phases with 25 mini-blocks each. The order of these phases was counterbalanced between participants (see Fig. S4). For each mini-block, choice and RT data was acquired as well as the amount of accumulated fuel as points.
The SAT was implemented in MathWorks MATLAB R2017a and run on a standard PC. Participants controlled the experiment with a computer keyboard. The move action was selected with the right arrow key and the jump action with the 'S' key. Prior to the experiment, participants underwent an extensive training: they were informed about the goal and conditions of the task and were instructed to carefully look for the optimal route of actions to choose in each mini-block. Moreover, they were informed about the effect of jumping uncertainty, but they were not given explicit probabilities but learned success/failure probabilities over time (see also next section). Participants were also tested how well they had memorized the travel pattern for jumps with feedback. Finally, they familiarized themselves with the task procedure during 20 training mini-blocks, 5 mini-blocks per condition. More details on the training procedure can be found in the supplementary material.
Computational model. We modelled participants' action choices with a mixture model of three single model-based RL agent models with planning depth of 1, 2 and 3 respectively. Each agent had an optimal model of the environment, i.e. it was completely informed about the rules of the task. This environment model entailed the set of available actions A = ′ move ′ , ′ jump ′ and states S (the planet configuration of the mini-block), the transition probabilities p(s t+1 |s t , a t ) for reaching a subsequent state s t+1 from a given state s t with action a t as well as the immediate reward of reaching each state r(s) indicating the planet types of the current configuration. Here, t denotes the trial within a mini-block. To choose the optimal action for a specific state in a specific miniblock, the agents computed the expected cumulative reward for executing each action with an optimal forward planning algorithm (value iteration algorithm; a detailed formulation can be found in the supplementary material) only limited by their planning depth.
Value iteration outputs the expected cumulative reward for executing each action a in the current state s with planning horizon d , which is called the state-action-or Q-values. These Q-values were the essential values for action selection. The higher the relative value of an action was, the higher should have been the probability of selecting that action. Action selection was therefore modelled probabilistically with a softmax function, one of the most universal assumption in the model-based reinforcement learning literature 21 . For our case of two available actions, this corresponded to a sigmoid transformation σ (x) of the difference between the Q-values, �Q(s t , d) . Choice probabilities were thus defined as: www.nature.com/scientificreports/ Here, the inverse temperature beta ( β ∈ [0, +∞] ) controlled the extent to which differences in Q-values affected action selection. Higher values of beta represented higher probability to select the action with the highest Q-value. If beta = 0 , actions are selected with a constant probability independent of outcomes which helped identify individuals which might have ignored the experimental instructions. The parameter theta ( θ ∈ [−∞, +∞] ) denoted an a priori response bias, where negative values implied a bias towards choosing the deterministic 'move' action, which was incorporated to capture a potential risk-averse tendency of individuals 22 .
Since the state transition probabilities for the jump action p s t+1 |s t , a t = ′ jump ′ were not given explicitly during training, we assumed an experience-based learning process of the corresponding state transition probabilities for the high and low noise condition respectively. The belief about the probability that a jump at trial t will be successful ρ t = p s t+1 = ′ target ′ |s t , a t = ′ jump ′ was updated using the temporal difference rule: depending on the experienced success (o t = 1) or failure (o t = 0) of a jump. The learning rate parameter alpha ( α ∈ [0, 1] ) modelled how fast participants changed their beliefs about the probability of transition success. Larger values of alpha could also be interpreted as faster forgetting of prior experience and stronger reliance on recent outcomes.

Planning depth and parameter inference.
To infer the four described free model parameters, inverse temperature beta, response bias theta, learning rate alpha and planning depth ( d ), from participants' choices, we used a hierarchical probabilistic model of free parameters and the approximate Bayesian inference scheme for computational feasibility. For a detailed description of the parameter inference, please refer to the supplementary material (Eq. S4 ff.). In simplified terms, we combined the above-described choice probabilities of the three RL agent components with planning depth d ∈ {1, 2, 3} in the form of a mixture model. The likelihood of our probabilistic model, i.e. the probability of choosing a specific action in a given mini-block, was thus defined as: where p(d b = d) denotes the probability each planning depth has and acts as a weight of the choice probability of the corresponding agent in the mixture model, where p(d b = 3) was set to zero for two-stage mini-blocks. Moreover, p(a b |s b ) and p(a b |s b , d b = d) are also functions of the model parameters (β, θ , α) described in the previous section (compare Eq. S6). These had to be marginalized out in order to retrieve a marginal posterior distribution over planning depth. As illustrated in Fig. 2, the mean of marginal posterior samples of planning depth for one mini-block was a categorical distribution. We calculated the mean of this distribution to get a mean planning depth per mini-block. Importantly, we assumed that forward planning should mostly happen before the first action of each mini-block. Hence, for the model inversion (parameter inference), we have constrained behavioural data only to the first choice in each mini-block being either move or jump.
To account for the limited amount of behavioral data, and for expected within group similarities in participants responses, we designed the probabilistic generative model in a hierarchical fashion with group-level, subject-level and condition-level priors. The parameters beta, theta and alpha were then modelled on the subjectlevel, while planning depth d was modelled on the mini-block-level. As an analytical solution for the posteriors of the parameters was intractable and also Markov chain Monte Carlo methods were computationally infeasible, we instead used the stochastic variational inference scheme from the probabilistic programming library Pyro v1.5.2 23 to approximate the posterior distributions. Importantly, we ensured using simulations that assumptions we made for approximate posteriors are qualitatively good and that we can recover both parameters and planning depth with high accuracy. For this, we fixed model parameters to the inferred values and simulated behavioral responses for the same number of subjects. Using the same inference scheme, we again inferred posterior parameter distributions from these responses and validated that the true parameter values fall within the 95% credible interval of the estimates. Finally, we also validated that we can recover the actual planning depth on individual trials by ensuring that for at least 95% of posterior samples, the true planning depth had the highest probability.

Statistical analysis.
Having formalized the within-subject cognitive mechanism with a generative model (Eqs. 1-5), we then followed the standard procedure and next analyzed estimated parameters and additional between-subject variables with classical statistical tests. To analyze mean planning depths, we first calculated mean values for each subject and condition to compare groups and experimental conditions. To analyze the differences between groups in more detail, we set up a linear mixed effects model with random intercept and random slopes to test the effects of age group, noise and steps condition on planning depth. We also included condition-by-group interaction terms. In a second analysis, we tested to what extent mean planning depths can be explained by performance in tasks measuring fluid cognitive abilities. For this purpose, we aggregated subject-wise mean planning depths over the whole task and linearly regressed them on the performance outcomes of the IDP and the SWM as well as a group indicator. To investigate the role of planning time, we also included SAT reaction time as a predictor. A detailed description of the models can be found in the supplementary material (Eq. S20 ff.). As planning depth was the main focus of our study, we decided to use a simple two-sample To evaluate the quality of our computational model of participants' choices in the SAT as described in the previous section, we analyzed how well the fitted model parameters explain variance in behavior. For this purpose, we linearly regressed SAT performance on all computational model parameters (mean planning depth, alpha, beta, theta). All statistical analyses were carried out using IBM SPSS Statistics (Version 28) with a significance level of α = 0.05. For outlier analysis, we decided to exclude mini-blocks with an RT below 150 ms, which is a common timeframe for solely perceptual and motor processes 24 .

Results
Descriptive statistics and group comparisons are depicted in Table 1. During outlier analysis, none of the SAT mini-blocks had to be excluded. However, there was one trial in the SAW and in the SWM with a reaction time below 150 ms that had to be excluded. Planning depth and performance. The overall mean planning depth in older adults was approximately 0.3 steps lower compared to younger adults, see Fig. 3A. When taking into account intercept and noise in the linear mixed effects analysis (estimated parameters depicted in Table 2), this difference was statistically significant as indicated by the fixed group effect, t(53.90) = − 5.10, p < .001. Moreover, the number of steps significantly predicted mean planning depth, such that an increase in the number of actions that could be performed sequentially lead to deeper planning, t(52.14) = 14.93, p < .001. Although the steps effect showed considerable variation between subjects, there was a strong positive steps effect for almost all subjects. The effect of the noise condition was not significant, t(104.31) = .52, p = .607, i.e. participants did not change their planning depths when exposed to the condition with high uncertainty on jumps. This effect also did not show relevant between-subject variation. The interaction terms did not yield any significant effect for group*noise, t(104.43) = 1.16, p = .249, or group*steps, t(52.14) = .74, p = .465. Performance (percentage of maximum possible points gained) was significantly lower for older adults, t(50) = 4.22, p < .001 (see Fig. 3B). There was no significant difference in mean reaction times between groups, only a visual trend of tendentially faster mean reaction times for older adults, t(50) = 1.94, p = .058 (see Fig. 3C). Across conditions and groups, longer reaction times in the SAT were associated with deeper mean planning depths (r = .512, p < .001) and higher SAT performance (r = .624, p < .001). SAT performance and planning depth also showed a significant positive correlation (r = .861, p < .001). A correlation analysis revealed significant associations of mean planning depth with IDP performance (r = .37, p < .01) and SWM performance (r = .32, p < .05). Including these covariates in a linear regression model of subject-wise mean planning depths (see Table 3 for parameter estimates) did not yield any significant effect for these predictors. Controlling for these covariates as well as speed in the SAT, the group variable still indicates a significant difference in planning depth between groups.
Model parameters. We found that values for learning rate α were overall close to zero suggesting stable beliefs about transitions uncertainties during the experiments. Learning rates for older adults were significantly lower compared to younger adults, t = 3.26, p ≤ .01 (see Fig. 3 D). Older adults showed a significantly lower inverse temperature β , t(34.74) = 3.97, p < .001 indicating higher response noise. The response bias θ differed significantly between groups, t(50) = − 3.35, p < .01. This means that younger adults showed a significant response bias towards the deterministic "move" action indicated by a negative theta value, t(24) = − 3.37, p < .01, whereas older adults showed no bias significantly different from zero, t(26) = 1.92, p = .066. All model parameters (mean planning depth, α , β , θ ) explained 88.2% of the variance in SAT performance (R 2 = .882) with a significant contribution of mean planning depth (Beta = .528, p < .001) and β (Beta = .500, p < .001). Detailed results of the underlying linear regression can be found in the supplementary material.

Discussion
In this study, we aimed at assessing how planning depth is modulated by old age. We found that in our sequential decision-making task that required forward planning in order to maximize outcomes, performance was lower in older adults compared to younger adults, i.e. they collected less points. Using a model-based analysis, we found in older adults lower inferred planning depths and higher response noise. Notably, older adults also showed substantially lower performance in cognitive tasks measuring processing speed, working memory and in our forward planning task slightly (trend-wise significant) faster reaction times. Even when controlling for lower processing speed, spatial working memory and reaction times, we still found the robust result of lower planning depth in older adults.
Our main finding of reduced planning depth in older adults is in line with established findings of cognitive aging 17 and evidence from classical planning tasks 8 . Moreover, planning depth of both groups increased with task complexity in the form of a deeper decision tree (2-steps to 3-steps condition) while the difference between age groups did not change (group*steps interaction). Interestingly, in the 3-steps condition most subjects, on average, planned ahead 2 or more steps, but planned ahead less than 2 steps in in the easier 2-steps condition. This indicates that in the SAT, not only the ability to plan but also invested effort plays a crucial role. This assumption is well in line with recent proposals of how cognitive control is regulated such as the Expected Value Table 1. Descriptive statistics and group comparison of model parameter and task outcomes. Scores represent means and standard deviations (in parenthesis). For the Space Adventure Task, performance was defined as percentage of maximum possible points gained. For the other tasks, performance indicated percentage of correct trials. a Z-statistic and corresponding asymptotic p-value based on non-parametric Mann-Whitney-U test as groups' distributions deviated strongly from normality. b Pearson's chi-squared test statistic with one degree of freedom and corresponding p-value.  Raincloud plots of estimated model parameters at the subject-level: learning rate alpha, inverse temperature beta and response bias theta. Note that for alpha, the y-axis is transformed to the log-scale for better visualization of the densities.  25,26 : if people plan in a resource-rational manner, they should balance the costs and benefits of investing cognitive resources to plan deeper. On the cost side, forward planning was probably more demanding for older adults because they had to compensate for more limited cognitive resources. On the benefit side, evidence from neuroimaging suggests that reward sensitivity is shifted from monetary reward to social reward in old age 27 . Because participants in the SAT were rewarded with a performance dependent amount of money, the benefits of planning might have been lower for older adults. Therefore, the expected value of control/planning was probably reduced in older adults which might have led to lower motivation and planning depth, or the use of simpler, less costly heuristical forward planning algorithms 20 . This would also explain why older adults reacted faster in the SAT in our sample and the association of RT and planning depth in our study. RT is a well-known indicator of cognitive cost of the underlying computations 28 . Nevertheless, the group differences in planning depth remained significant when controlling for reaction time. This indicates that additional differences are at play and that group differences are not just due to a difference in trading off planning depth and speed. Explicitly modelling such hypothetical alternative planning algorithms in future model comparison studies of forward planning could provide further insight. Differences in effort allocation between both age groups might also explain why neither spatial working memory nor processing speed were significant predictors of mean planning depth. The tasks used to measure general cognitive performance are not reward-based but motivate participants with time pressure. Therefore, weighing up effort and reward might play a minor role. In the SAT however, we assume that especially in the 2-steps condition individual limits of storage were mostly not reached due to reduced effort and reduced allocation of cognitive resources.
A second explanation of lower task performance suggests that older adults, although extensively trained, build noisier task representations [29][30][31] . This effect could also underlie lower inferred planning depth. For all applied RL agent models, correct task knowledge was assumed but older adults might have forgotten aspects of the task during the course of the experiment. Noisy or erroneous task representations with still extensive planning could result in similar choices as lower planning depth with correct task models. It is even conceivable that these mechanisms are intertwined: noise in representations could accumulate during forward planning computations and lead to less precise outcome predictions with deeper planning, which would effectively result in a gradual limitation of planning depth. Another line of argument from Jiang et al. 32 elaborates further. These authors show mathematically that in the case of incorrect task models, limitation of planning depth can even be beneficial as it can avoid overfitting in policy selection. In other words, they show that it is useful not to set up far-reaching plans on imprecise knowledge. Transferred to the SAT, this means that participants might continuously estimate the uncertainty of their belief over the task structure, e.g. the travel pattern or transition probabilities. This uncertainty is presumably higher in older adults and therefore they might have dynamically adapted their planning horizons. Unfortunately, the set of possible incorrect or noisy task representations is extremely large making exact inference practically infeasible. In the context of the present study, deviations from correct task representations might therefore instead partly be reflected in lower inverse temperature values, which was the case for older adults.
A third explanation for reduced task performance is that group differences could also have been driven by processes independent from the computation of the prediction, e.g. by constantly increased response noise. The latter is well captured by the inverse temperature parameter, which indeed was decreased in older compared to younger adults. This could explain why we only found main effects of age group and task complexity but no group*steps interaction effect. Moreover, this parameter was-besides planning depth-the best predictor of performance in the SAT, clearly indicating a central influence of response noise on outcomes of forward planning. Though higher response noise well accounts for lower task performance (more random choices), it does not explain lower planning depth, since both parameters of the computational model are theoretically independent.
Limitations. Contra-intuitively, inferred planning depth was almost equal between the high and low noise condition for both groups. This is inconsistent with a previous study that found decreased model-basedness when state transitions were less predictable in the two-step task 33 . Post-hoc simulations revealed that this result was most likely caused by a suboptimal design of the task: high noise mini-blocks yielded on average higher rewards and were presented at a later stage during the experiment. Thus, motivation and learning effects are potential confounding variables for the noise effect. We therefore refrain from any interpretations here and will instead work on an improved version to address this in the future. www.nature.com/scientificreports/ Furthermore, we cannot exclude that older adults might have forgotten aspects of the task during the course of the experiment. For future studies, we therefore aim at using a simplified version of the task to minimize this potential confound.
There are two other potential limitations of the current work that we wish to address: first, the cross-sectional design of our study did not control for systematic differences in computer (game) experience between the two cohorts. This could be a significant confounder of the age effect which should be considered in further investigations, e.g. by longitudinal studies or by attempts of measuring the amount of computer experience directly.
Second, we propose a limitation of planning depth as a mechanism to keep model-based control affordable and our computational model showed good validity as the model parameters could explain most of the variance in planning performance. However, we want to highlight that our model just captures one example out of a variety of possible planning algorithms and that humans most likely apply multiple strategies 20,29 , which seems to be especially relevant for older adults. This is also indicated by the inverse temperature parameter which-besides response noise-can also capture mismatch between model and data. Higher response noise (lower inverse temperature) in older adults could therefore also point towards older adults using alternative heuristical strategies which are not covered by our model.

Conclusion
Taken together, our findings strongly point towards an age-related reduction of forward planning that cannot be explained by reduced cognitive abilities (i.e. working memory or processing speed) in older adults. Rather, reduced task performance in older adults seems to be partly due to higher randomness of their choices (higher response noise) and reduced depth of forward planning. We speculate that the reduction of planning depth in older adults is driven by reduced allocation of model-based effort, with older adults seemingly applying lowercost strategies. Noisier task representations might also play a role in this process. We assume that additional strategic differences are at play and that group differences in planning depth are not just due to differences in trading off costs and benefits of planning. Explicitly modelling and testing these hypothetical alternative planning algorithms in future studies of forward planning could shed more light on this issue.

Data availability
All data generated or analysed during this study along with the code and scripts necessary to perform the modelbased inference and statistical analyses are available in the plandepth_age Github repository, https:// github. com/ jeffe nsen/ pland epth_ age.